[SYSTEMDS-3840] Builtin scripts parameter consolidation#2228
[SYSTEMDS-3840] Builtin scripts parameter consolidation#2228e-strauss wants to merge 1 commit intoapache:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2228 +/- ##
=============================================
- Coverage 72.47% 21.64% -50.83%
+ Complexity 45440 14140 -31300
=============================================
Files 1469 1469
Lines 170873 170873
Branches 33315 33315
=============================================
- Hits 123836 36993 -86843
- Misses 37622 130547 +92925
+ Partials 9415 3333 -6082 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d60724e to
2b29972
Compare
|
I just squashed the commits and removed the changes in the github action configs |
|
LGTM, we just need to resolve the merge conflicts. |
This commit consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly. The changes in the builtin scripts arguments: order_rand -> Order search_iterations -> iter batch_size -> batchSize n_bins -> n ytest -> Ytest max_iter -> maxIter max_func_invoc, max_iterations -> maxIter include_mean -> includeMean icpt -> intercept maxi -> maxIteration maxii -> maxInnerIteration cMask -> ctypes is_verbose -> verbose k_max -> maxK kmax -> iter (start/end)_stepsize -> (start/end)Stepsize (start/end)_vicinity -> (start/end)Vicinity sim_seed -> seed k_value -> k CL_T -> ctypes trans_continuous -> tranCont select_k -> selectK k_min -> minK k_max -> maxK select_feature -> selectFeature feature_max -> maxFeatures feature_importance -> featureImportance predict_con_tg -> predictCont START_SELECTED -> initSelectFeature frequency_threshold -> frequencyThreshold distance_threshold -> distanceThreshold is_verbose -> verbose thresh -> threshold reduced_dims -> reducedDims max_iter -> maxIter is_verbose -> verbose print_iter -> printIter min_leaf -> minLeaf min_split -> minSplit num_trees -> numTrees max_depth -> maxDepth max_features -> maxFeatures max_values -> maxValues max_dataratio -> maxDataRatio sample_frac -> sampleFrac feature_frac -> featureFrac window_size -> windowSize sample_percent -> sampleFrac is_verbose -> verbose alpha -> lr batch_size -> batchSize learning_rate -> lr out_activation -> outActivation loss_fcn -> lossFn validation_split -> validationSplit types -> ctypes sml_type -> smlType num_trees -> numTrees learning_rate -> lr max_depth -> maxDepth lambda -> reg icpt -> intercept maxi, moi, iter -> maxIter mii -> maxInnerIter epsilon -> tol maxIterations -> maxIter maxii -> maxInnerIter eps -> tol max_iter, iterations -> maxIter is_verbose -> verbose avg_sample_size_per_centroid -> avgSampleSizePerCentroid space_decomp -> spaceDecomp n_components -> nComponents init_param -> initParams reg_covar -> reg rnk -> rank eps -> tolerance maxi -> maxIter dth, thr -> threshold maxi -> maxIter dth, thr -> threshold maxi -> maxIter
2b29972 to
7e0ae37
Compare
|
@mboehm7 Hi, can we merge in the changes ? or is there something I should change first |
|
@mboehm7 , I did look though the changes and they seem reasonable. However, you (@mboehm7) voiced some concerns on the changes. Could you give a LGTM, merge, or voice your concerns here? |
This PR consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly.
The changes also resulted in a couple changes in the test script for the builtin scripts. For the discovery of the usages of the changed builtins, I had to manually "grep" the ./src/test/scripts folder, so that I dont break any tests, but also the ./script folder to catch the usages in tutorials, perftest, etc. I hope, I've catched everything there, but I can not guaranty it, since the github test just run on test script folder.
I applied the following formatting convention:
allowed:
X, Xtest, ScaleFactor,
tol, maxIter
not allowed:
X_test, max_iter
Furtheremore, I used the following python script / notebook for the discovery of all unique parameter names:
https://gist.github.com/e-strauss/ccec59f504ed3882a2141d859ff681ec
In the following, I want to give a quick overview over the consolidations, that I applied.
In total, I found 62 matches for consolidation and reduced the total number of unique parameters to 340, which was previously at 402.
I changed the spelling of the following 76 parameters names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa
Through either consolidation or applying the naming convention, I removed the following 138 parameter names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa
A list of all current parameters can be found here:
https://gist.github.com/e-strauss/8aad1143b016d7bda28f3884038d62d3